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Abstract 

M-ary signal transmission over AWGN channel with additive Q-ary interference where 
the sequence of i.i.d. interference symbols is known causally at the transmitter is considered. 
Shannon's theorem for channels with side information at the transmitter is used to formulate 
the capacity of the channel. It is shown that by using at most MQ — Q + 1 out of input 
symbols of the associated channel, the capacity is achievable. For the special case where the 
Gaussian noise power is zero, a sufficient condition, which is independent of interference, is 
given for the capacity to be log 2 M bits per channel use. The problem of maximization of 
the transmission rate under the constraint that the channel input given any current interference 
symbol is uniformly distributed over the channel input alphabet is investigated. For this setting, 
the general structure of a communication system with optimal precoding is proposed. The 
extension of the proposed precoding scheme to continuous channel input alphabet is also 
investigated. 

Index Terms 

Causal side information, interference, channel capacity, precoding, linear programming, 
integer programming. 
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I. Introduction 

Information transmission over channels with known interference at the transmitter 
has been a major focus of research due to its application in various communication 
problems. A remarkable result on such channels was obtained by Costa who showed that 
the capacity of the additive white Gaussian noise (AWGN) channel with additive Gaussian 
i.i.d. interference, where the sequence of interference symbols is known non-causally 
at the transmitter, is the same as the capacity of AWGN channel [1]. Therefore, the 
interference does not incur any loss in the capacity. This result was extended to arbitrary 
interference (random or deterministic) Erez et al. [2]. Following Costa's "Writing on 
dirty paper" famous title [1], coding strategies for the channel with non-causally known 
interference at the transmitter are referred to as "dirty paper coding" (DPC). 

Transmission over multiple-input multiple-output (MIMO) broadcast channel is an 
important application of DPC. In such systems, for a given user, the signals sent to the 
other users are considered as interference. Since all signals are known to the transmitter, 
dirty paper coding can be used after some linear preprocessing [3]. It was shown that 
DPC in fact achieves the sum capacity of the MIMO broadcast channel [4], [5], [6]. 
Most recently, it has been shown that the same is true for the entire capacity region of 
the MIMO broadcast channel [7]. Another important application of DPC is information 
embedding or watermarking [8], [9], [10], where a host signal is modeled as interference 
onto which a watermark signal is embedded. 

The result obtained by Costa does not hold for the case that the sequence of 
interference symbols is known causally at the transmitter. In fact, the capacity is unknown 
in this case and unlike the non-causal knowledge setting, the capacity depends on the 
interference. The only definitive result in this case is due to Erez et al. [2] who showed 
that, for the worst-case interference, at the limit of high SNR, the loss in capacity due to 
not having the future samples of the interference at the transmitter is exactly the ultimate 
shaping gain \ log (^) « 0.254 bit. 
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In this paper, we consider the AWGN channel with i.i.d. additive discrete interference 
where the sequence of interference symbols is known causally at the transmitter. The 
discrete interference model is more appropriate for many practical applications. For 
example, in the MIMO broadcast channel, due to the fact that in practice the user signals 
are chosen from finite constellations, the interference caused by the other users is discrete 
rather than continuous. We are interested in both capacity of the channel and precoding 
schemes for the channel. 

The rest of the paper is organized as follows. In section [III we provide some 
background on channels with side information at the encoder. In section [Till we introduce 
our channel model. In section [IV] we investigate the capacity of the channel. In section 
N\ we consider maximizing the transmission rate under the constraint that the channel 
input given any current interference symbol is uniformly distributed over the channel input 
alphabet. The general structure of a communication system for the channel with causally- 
known discrete interference is given in section [Vlj We extend the uniform transmission 
scheme to continuous-input alphabet in section IVIIl We conclude this paper in section 

II. Channels with Side Information at the Transmitter 

Channels with known interference at the transmitter are special case of channels 
with side information at the transmitter which were considered first by Shannon [11]. 

Shannon considered a discrete memoryless channel (DMC) whose transition matrix 
depends on the channel state. A state-dependent discrete memoryless channel (SD-DMC) 
is defined by a finite input alphabet X, a finite output alphabet y, and transition prob- 
abilities p(y\x,s), where the state s takes on values in a finite alphabet S. The block 
diagram of a state-dependent channel with state information at the encoder is shown in 

ng.m 

We may consider two settings for the knowledge of state sequence at the encoder: 
causal or non-causal. In the causal knowledge setting, the encoder maps a message w 
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Fig. 1. SD-DMC with state information at the encoder. 
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Fig. 2. The associated regular DMC. 



into X n such that the channel input at time % is a function of the message w and the 
state sequence up to the time i, i = 1,2, ... ,n, whereas in the non-causal knowledge 
setting, the encoder observes the entire state sequence to generate every symbol of the 
code sequence. 

Shannon considered the case where the i.i.d. state sequence is known causally at the 
encoder and obtained the capacity formula [11]. The case where the i.i.d. state sequence 
is known non-causally at the encoder was considered by Kuznetsov and Tsybakov in the 
context of coding for memories with defective cells [12]. Gel'fand and Pinsker obtained 
the capacity formula for this case [13]. 

Shannon's capacity formula was generalized by Salehi [14] for the case that a noisy 
version of the state sequence is available at both encoder and decoder. Caire and Shamai 
[15] investigated the case that the state sequence is not memory less. The capacity results 
with non-causal side information at the encoder were generalized to the case were rate- 
limited side information is available at both encoder and decoder [16], [17]. 
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Shannon [11] showed that the capacity of an SD-DMC where the i.i.d. state sequence 
is known causally at the encoder is equal to the capacity of an associated regular (without 
state) DMC with an extended input alphabet T and the same output alphabet y. The input 
alphabet of the associated channel is the set of all functions from the state alphabet to the 
input alphabet of the state-dependent channel. There are a total of \X\^ of such functions, 
where |.| denotes the cardinality of a set. Any of the functions can be represented by a 
|5|-tuple (xi,x 2 , . . • , X\s\) of elements of X, implying that the value of the function at 
state s is x s , s — 1, 2, . . . , |«S|. 

The transition probabilities for the associated channel are given by [11] 

l«s| 

p(v\t) = ^2p(s)p(y\x s ,s), (l) 

s=l 

where t denotes the the function represented by (xj, X2, ■ ■ ■ , x\s\)- Also, 

n 

p{y(l) ■ ■■y{n)\t{l) ■ ■ -t(n)) = l[p{y(i)\t(i)), (2) 

i=l 

where i denotes the time index. The capacity is given by [11] 

C = max J(T;Y), (3) 

p(t) 

where the maximization is taken over the probability mass function (pmf) of the random 
variable T. 

Any encoding and decoding scheme for the associated channel can be translated 
into an encoding and decoding scheme for the original state-dependent channel with the 
same probability of error [11]. An encoder for the associated channel encodes a message 
w to (t( 1 ),■••, t(n)). The translated encoding scheme for the original state-dependent 
channel is to map the message w to (x(l), x(2), . . . , x(n)), where x(i) = sth component 
of t(i) if the state at time i is s, s — 1, 2, . . . , |«S|, and i — 1, 2, . . . , n. The block diagram 
of the associated regular DMC is shown in fig. [2l 

In the capacity formula ©, we can alternatively replace the random variable T 
with (Xl, . . . ,X\s\), where X s is the random variable that represents the input to the 
state-dependent channel when the state iss,s = l,...,|<S|. 
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III. The Channel Model 
We consider data transmission over the channel 

Y = X + S + N, (4) 

where X is the channel input, which takes on values in a fixed real constellation 

X = {x 1 ,x 2 , . . . ,x M } , (5) 

Y is the channel output, N is additive white Gaussian noise with power P N , and the 
interference S is a discrete random variable that takes on values in 

S = {si, s 2 , . . . , s Q } (6) 

with probabilities ri, r 2 , . . . , tq, respectively. The sequence of i.i.d. interference symbols 
is known causally at the encoder. 

The above channel can be considered as a special case of state-dependent channels 
considered by Shannon with one exception, that the channel output alphabet is continu- 
ous. In our case, the likelihood function fy\x,s{y\x, s) is used instead of the transition 
probabilities. We denote the input to the associated channel by T, which can also be 
represented as (X 1 ,X 2 , . . . ,Xq), where X, is the random variable that represents the 
channel input when the current interference symbol is Sj, j — 1, . . . , Q. 

The likelihood function for the associated channel is given by 

Q 

fY\r(y\t) = ^2rjf Y \xM x ii> s 3) 
j'=i 
Q 

= ^2 r jfN(y - - sj), (7) 
J'=l 

where f N denotes the pdf of the Gaussian noise N, and t is the input symbol of the 
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associated channel represented by (x^, x i2 , . . . , x iQ ). The pdf of Y is then given by 

M M / Q 



il=l ig=l \j=l 
Q M 

Si) 



j=l i=l 

where p ili2 ...i Q = Pr{X x = a^, . . . , X Q = x ig }, pf ] = Pr{X, = £;}. 



IV. The capacity 

The capacity of the associated channel, which is the same as the capacity of the 
original state-dependent channel, is the maximum of I(T; Y) = I(XiX 2 ■ ■ ■ Xq; Y) over 
the joint pmf values p ili2 ... iQ , i.e., 

C= max I(X 1 X 2 ---X Q ;Y). (9) 

The mutual information between T and Y is the difference between differential entropies 
h(Y) and h(Y\T). It can be seen from © that /y(y), and hence h(Y), are uniquely 
determined by the marginal pmfs {p ( f^}fi 1 , j = 1, . . . , Q. The conditional entropy h(Y\T) 
is given by 

h(Y\T) = h{Y\X 1 X 2 ---X Q ) 

M M 

= ^2 ■ • ^^•••iQ^l^l = X h ,. . . ,X Q = X iQ ) 
*l = 1 *Q = 1 
A/ M 

= ^2---^2Ph-i Q K-i Q , ( io ) 

where ^ r ..j Q = = x h , ...,X Q = x iQ ). 

There are M Q variables involved in the maximization problem ©. Each variable 
represents the probability of an input symbol of the associated channel. The following 
theorem regards the number of nonzero variables required to achieve the maximum in 
©. 



8 



Theorem 1: The capacity of the associated regular channel is achieved by using at 
most MQ — Q + 1 out of M Q inputs with nonzero probabilities. 

Proof: Denote by {p ( f^}fi 1 the pmf of Xj, j = 1, 2, . . . , Q, induced by a capacity- 
achieving joint pmf {Pi^-io}^ i Q =i- We nm it the search for a capacity-achieving joint 
pmf to the set of joint pmfs that yield the same marginal pmfs as {p il ... iQ }f^ iQ=1 . By 
limiting the search to this set, the maximum of I{X\ ■ ■ ■ Xq; Y) remains unchanged (since 
the capacity-achieving joint pmf {p^...^}^ iQ=1 is in the new set). But all joint pmfs in 
the new set yield the same h(Y) since they induce the same marginal pmfs on Xi, . . . , Xq. 
Therefore, the maximization problem in © reduces to the linear minimization problem 

At M 

min " ' Yl K-i Q Vh-i Q 

Ph ' 1q h=l i Q =l 

subject to 

M M 

^''•^Ph-4Q=Pi?> zi = l,2,...,M, 

»2=1 *Q=1 
M M 

E "- E Pii-*o=Pio ) > iq = 1,2,..., M, 

ii=l iq_i=1 

Pn-ig >0, 2 1 ,...,2Q = 1,2,...,M. (11) 

There are MQ equality constraints in (fTTT) out of which MQ — Q + 1 are linearly 
independent. From the theory of linear programming, the minimum of (fTTT) . and hence the 
maximum of I(Xi ■ ■ ■ Xq; Y), is achieved by a feasible solution with at most MQ — Q + 1 
nonzero variables. ■ 
Theorem \T\ states that at most MQ — Q + 1 out of M Q inputs of the associated 
channel are needed to be used with positive probability to achieve the capacity. However, 
in general, one does not know which of the inputs must be used to achieve the capacity. If 
we knew the marginal pmfs for X\, . . . , Xq induced by a capacity-achieving joint pmf, 
we could obtain the capacity-achieving joint pmf itself by solving the linear program 

dH). 



9 



A. The Noise-Free Channel 

We consider a special case where the noise power is zero in ©. In the absence 
of noise, the channel output Y takes on at most MQ different values since different X 
and S pairs may yield the same sum. If Y takes on exactly MQ different values, then it 
is easy to see that the capacity is log 2 M bits Q: The decoder just needs to partition the 
set of all possible channel output values into M subsets of size Q corresponding to M 
possible inputs, and decide that which subset the current received symbol belongs to. 

In general, where the cardinality of the channel output symbols can be less than 
MQ, we will show that under some condition on the channel input alphabet, there exists 
a coding scheme that achieves the rate log 2 M in one use of the channel. We do this 
by considering a one-shot coding scheme which uses only M (out of M®) inputs of the 
associated channel. 

In a one-shot coding scheme, a message is encoded to a single input of the associated 
channel. Any input of the associated channel can be represented by a Q-\.u^i\q composed 
of elements of X. Given that the current interference symbol is Sj, the jth element 
of the Q-tuple is sent through the channel. Therefore, one single message can result 
in (up to) Q symbols at the output. For convenience, we consider the output symbols 
corresponding to a single message as a multi-sej^l of size (exactly) Q. If the M multi- 
sets at the output corresponding to M different messages are mutually disjoint, reliable 
transmission through the channel is possible. 

Unfortunately, we cannot always find M inputs of the associated channel such that 
the corresponding multi-sets are mutually disjoint. For example, consider a channel with 
the input alphabet X = {0,1,2,4} and the interference alphabet S = {0, 1,3}. It is easy 
to check that for this channel we cannot find four triples composed of elements of X such 
that the corresponding multi-sets are mutually disjoint. In fact, by entropy calculations, 

'This is true even if the interference sequence is unknown to the encoder. 

2 A multi-set differs from a set in that each member may have a multiplicity greater than one. For example, {1, 3, 3, 7} 
is a multi-set of size four where 3 has multiplicity two. 
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we can show that the capacity of the channel in this example is less than 2 bits. 

However, if we impose some constraint on the channel input alphabet, the rate 
log 2 M is achievable. 

Theorem 2: Suppose that the elements of the channel input alphabet X form an 
arithmetic progression. Then the capacity of the noise-free channel 

Y = X + S, (12) 

where the sequence of interference symbols is known causally at the encoder equals 
log 2 M bits. 

Proof: Let y^ be the set of all possible outputs of the noise-free channel when 
the interference symbol is s q , i.e., 

y {g) = {xi + s q ,x 2 + Sg, . . . ,x M + s q } , q = l,...,Q. (13) 

The union of y^s is the set of all possible outputs of the noise-free channel. 

Without loss of generality, we can assume that s 1 < s 2 < ■ ■ ■ < sq. The elements 
of y^ form an arithmetic progression, q = 1, . . . , Q. Furthermore, these Q arithmetic 
progressions are shifted versions of each other. 

We prove by induction on Q that there exist M mutually-disjoint multi-sets of size 
Q composed of the elements of y^\y^ 2 \ . . . , y^ (one element from each). If we can 
find such M multi-sets of size Q, then we can obtain the corresponding M Q-tuples of 
elements of X by subtracting the corresponding interference terms from the elements of 
the multi-sets. These M Q-tuples can serve as the inputs of the associated channel to be 
used for sending any of M distinct messages through the channel without error in one 
use of the channel, hence achieving the rate log 2 M bits per channel use. 

For Q = 1, the statement of the theorem is true since we can take {x 1 + Si}, {x 2 + 
si}, . . . , {xm + si} as mutually-disjoint sets of size one. 

Assume that there exist M mutually-disjoint multi-sets of size Q = q. For Q = q+1, 
we will have the new set of channel outputs 3X9+ 1 ) = {xi+s q+ ±, x 2 +s q+ i, . . . , XM+s q +i}. 
We consider two possible cases: 



11 



Xi + Si X 2 + Si X 3 + Si % + Si 



37(9+1) 



Xi + Sj 



X 2 + Sj 

* 



X 3 + Sj 

* 



X M + Sj 



X 2 + Sg 



X \ + '. S q+1 X 2 + S q+l 



jX 3 + Sg X M + S„ 



^fe + s q+l X M + 



Fig. 3. The elements of 3^ , . . . , y < - q+1 ' 1 shown as shifted version of each other. The elements of y( q+1 ^ up to 
Xk + Sg+i appear in y . 



Case 1: None of the elements of y( q+1 ^ appear in any of the multi-sets of size 
Q = q. 

In this case, we include the elements of y^ g+1 ^ in the M multi-sets arbitrarily (one 
element is included in each multi-set). It is obvious that the resulting multi-sets of size 
Q = q + 1 are mutually disjoint. 

Case 2: Some of the elements of appear in some of the multi-sets of size 

Q = q. 

Suppose that the largest element of 3^ 9+1 ) which appears in any of the sets y^\ . . ., 
y^ (or equivalently, in any of the multi-sets of size Q = q) is x k + s q+ i for some 
1 < k < M - 1. Then since y( q+1 *> is shifted version of each y^\ and s q+1 > 

s q > ■ ■ ■ > Si, exactly one of the sets • ■ ■ , y {q \ say y& for some 1 < j < q, 
contains all elements of up to Xk + s g +i. See fig. [3] Since any of the disjoint 

multi-sets of size Q contain just one element of ^ , the elements of y^ q+l ^ up to 
Xk + Sq+i appear in different multi-sets of size Q = q. We can form the disjoint multi- 
sets of size q + 1 by including these common elements in the corresponding multi-sets 
and including the elements of {x k+ i + s g+1 , . . . ,x M + s q +i} in the remaining multi-sets 
arbitrarily. ■ 



12 



The condition on the channel input alphabet in the statement of theorem [2] is a 
sufficient condition for the channel capacity to be log 2 M. However, it is not a necessary 
condition. For example, the statement of theorem [2] without that condition is true for the 
case Q = 2. Because in the second iteration, we do not need the arithmetic progression 
condition to form M mutually-disjoint multi-sets of size two. 

It is worth mentioning that in the proof of theorem[2l we did not use the assumption 
that the interference sequence is i.i.d.. In fact, the interference sequence could be any 
arbitrary varying sequence of the elements of S. 

The proof of theorem [2] is actually a constructive algorithm for finding M (out of 
M®) inputs of the associated channel to be used with probability to achieve the rate 
log 2 M bits. 

It is interesting to see that the set containing the qth elements of the M Q-tuples 
obtained by the constructive algorithm is X, q = 1, . . . , Q. This is due to the fact that each 
multi-set contains one element from each , . . . , 3^ • Therefore, a uniform distribution 
on the M Q-tuples induces uniform distributions on 1^ ... , Xq. 

V. Uniform Transmission 

In the sequel, we study the maximization of the rate I(Xi ■ ■ • Xq;Y) over joint 
pmfs {Pi 1 -i }f i ..., !Q= i that induce uniform marginal distributions on Xi, . . .,Xq, i.e., 

V? =vf ] = ---=vf ] = * = 1,2,...,M, (14) 

for which we show how to obtain the optimal input probability assignment. We call 
a transmission scheme that induces uniform distributions on X%,...,Xq as uniform 
transmission. Uniform distributions for Xi, . . . ,Xq implies uniform distribution for X, 
the input to the state-dependent channel defined in ©. 

In the previous section, we established that the capacity achieving pmf for the 
asymptotic case of noise-free channel induces uniform distributions on Xi, . . . , Xq (pro- 
vided that we can find M Q-tuples such that the corresponding multi-sets are mutually 
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disjoint). Therefore, imposing the uniformity constraint given in (1141) does not reduce the 
transmission rate in the asymptotic case of noise-free channel. However, in the general 
case where the noise power is not zero there will be some loss in rate due to imposing 
the uniformity constraint. 

Imposing the uniformity constraint along with the integrality constraint (which will 
be explained later on in this section), however, simplifies the encoding operation for the 
associated channel as will be shown in this section. Furthermore, we will show in section 
IVIII that our precoding scheme with both uniformity and integrality constraints provides 
higher rates than the existing modulo precoding scheme of [2]. 

Considering the uniformity constraints in (fT4"l) . the maximization of I(X\ ■ ■ ■ Xq; Y) 
is reduced to the linear minimization problem 



M M 



mm 

Vi v --i Q 



• • • h il ... iQ p il ... iQ 



subject to 



M M 



12=1 



M M 



J2 = 17' *Q = 1>2,...,M 



M' 

H=l «Q-1=1 

P«...< Q >0, h,...,i Q = 1,2,..., M. (15) 

The equality constraints of ( IT3T ) can be interpreted as the following. We assign Pi r ..i Q 
to the element (ii, . . . , iq) of an M by M ■ ■ ■ by M (Q times) array. For Q = 2, the 
equality constraints of (IT3T ) mean that every row and every column of the array adds up 
to jj. For Q > 2, the equality constraints can be interpreted accordingly. 

The same argument used in the last part of the proof of theorem Q] can be used 
to show that the maximum rate with uniformity constraint is achieved by using at most 
MQ — Q + 1 inputs of the associated channel with positive probabilities. This is restated 
in the following corollary. 
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Corollary 1: The maximum of I(X 1 ■ ■ ■ X Q ;Y) over joint pmfs {p h ■..i Q }^ r .. jiQ= i 
that induce uniform marginal distributions on X\, X 2 , ■ ■ ■ , Xq is achieved by a joint pmf 
with at most MQ — Q + 1 nonzero elements. 

This result is independent of the coefficients {h^...^}. However, which probability 
assignment with at most MQ — Q + 1 nonzero elements is optimal depends on the 
coefficients {hi v ..i Q }. The coefficient h^...i Q is determined by the interference levels 
si, . . . , sq, the probability of interference levels r±, . . . , Tq, the noise power P N , and the 
signal points xi,x 2l ■ ■ ■ ,xm- The optimal probability assignment is obtained by solving 
the linear programming problem (TT5T) using the simplex method [19]. 

A. Two-Level Interference 

If the number of interference levels is two, i.e., Q = 2, we can make a stronger 
statement than corollary CD 

Theorem 3: The maximum of I(XiX 2 ; Y) over {p^ j 2=1 with uniform marginal 
pmfs for Xi and X 2 is achieved by using exactly M out of M 2 inputs of the associated 
channel with probability -h. 

Proof: The equality constraints of (TT3T) can be written in matrix form as 

Ap=l, (16) 

where A is a zero-one MQ x M Q matrix, p is M times the vector containing all p^.-igS 
in lexicographical order, and 1 is the all-one MQ x 1 vector. 

For Q = 2, it is easy to check that A is the vertex-edge incidence matrix of 
K MtM , the complete bipartite graph with M vertices at each part. Therefore, A is a 
totally unimodular matrix^ [18]. Hence, the extreme points of the feasible region F = 
{p : Ap = 1, p > 0} are integer vectors. Since the optimal value of a linear optimization 
problem is attained at one of the extreme points of its feasible region, the minimum in 

3 A totally unimodular matrix is a matrix for which every square submatrix has determinant 0, 1, or — 1. 
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Fig. 4. Optimal solution for 4-PAM input with parameters n = r?, = ^, si = —2, S2 = +2, Pat = 1. 

(fT5l) is achieved at an all-integer vector p*. Considering that p* satisfies (fT6l) . it can only 
be a zero-one vector with exactly M ones. ■ 
As an example, the optimal solution for a channel with X = {—3, —1, +1, +3} and 
S = {—2,2} with equiprobable interference symbols is illustrated in fig. @] The points 
circled in the array correspond to the inputs to the associated channel that must be chosen 
with probability | in order to achieve the maximum rate in the uniform transmission 
scenario. 

Fig. [5] depicts the maximum mutual information (for the uniform transmission 
scenario) vs. SNR for the channel with X = S = {— 1, +1} and equiprobable interference 
symbols. The mutual information vs. SNR curve for the interference-free AWGN channel 
with equiprobable input alphabet { — 1, +1} is plotted for comparison purposes. As it can 
be seen, for low SNRs, the input probability assignment p u = p 2 2 = \ is optimal, 
whereas at high SNRs, the input probability assignment pi 2 = P21 — \ is optimal. The 
maximum achievable rate for uniform transmission is the upper envelope of the two 
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Fig. 5. Maximum mutual information vs. SNR for the channel with X — S = { — 1, +1} and ri — ri = \. 

curves corresponding to different input probability assignments. Also, it can be observed 
that the achievable rate approaches log 2 2 = 1 bit per channel use as SNR increases 
complying with the fact that we established in section [IV] for the noise-free channel. 

It turns out from the proof of theorem [3] that the optimum solution of the linear 
optimization problem, p*, is a zero-one vector. So, if we add the integrality constraint 
to the set of constraints in (fT6l) . we still obtain the same optimal solution. The resulting 
integer linear optimization problem is called the assignment problem [18], which can be 
solved using low-complexity algorithms such as the Hungarian method [19]. 

B. Integrality Constraint for the Q -Level Interference 

The fact that for the case Q = 2, there exists an optimal p which is a zero-one vector 
with exactly M ones simplifies the encoding operation. Because any encoding scheme 
just needs to work on a subset of size M of the associated channel input alphabet with 
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equal probabilities j^. 

For Q 7^ 2, A is not a totally unimodular matrix. Therefore, not all extreme points 
of the feasible region defined by Ap = l,p > 0, are integer vectors. However, at the 
expense of possible loss in rate, we may add the integrality constraint (i.e., p integer) in 
this case. The resulting optimization problem is called the multi-dimensional assignment 
problem [20] . The optimal solution of (fT5l) with the integrality constraint, will be a vector 
with exactly M nonzero elements with the value -g. Therefore, any encoding scheme just 
needs to use M symbols of the associated channel with equal probabilities, simplifying 
the encoding operation. 

Fig. [6] depicts the maximum mutual information for uniform transmission with the 
integrality constraint vs. SNR for the channel with X = S = {— 3, — 1, +1, +3} and 
with equiprobable interference symbols. The mutual information vs. SNR curve for the 
interference-free AWGN channel with equiprobable input alphabet {—3, —1, +1, +3} is 
plotted for comparison purposes. It is interesting to mention that we obtained the exact 
same curves as in fig. [6] without imposing the integrality constraints. 

It is worth mentioning that, with the integrality constraint, the optimal solution of 
<n~5l) is a joint pmf of Xl, . . . , Xq for which X 2 , ■ ■ ■ , Xq can be presented as a function 
of X x . 

C. Explicit Optimal Solutions 

In the sequel, we further investigate the optimal solution of (fT5l) . It can be shown 
that the coefficient h ix ...i Q = h(Y\Xi =:%,... , Xq = x iQ ) is a function of x^—x^^x^ — 

ry . nf . ' f . 1 a 

• • • ; ^l-^ ^Iq , 1.^., 

where g is a given by 

g(u!, u Q -i) = - f+™ (rif N {z) + ^ =2 r q f N (z + u q -i + s 1 - s q )j x 

log 2 (ri/jv(z) + E?= 2 r qfN(z + u q ^i + si - s q )) dz. (18) 
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Fig. 6. Maximum mutual information vs. SNR for the channel with X = S = {—3, — 1, +1, +3} and n = V2 — 

r 3 =r A = \. 

The plot of g(.) for Q = 2 with parameters n = |, r 2 = ~, s x = —2, s 2 = +2, Pjv = 1 
is shown in fig. H The plot of g(.) for Q = 3 with parameters r\ — r-i — — |, si — 
—2, S2 = 0, S3 = +2, = 1 is shown in fig. [U In Appendix HI it has been shown that 
g is lower bounded by the differential entropy of the noise, h(N), and is upper-bounded 
by h(N) + H(S), where H(S) is the entropy of the discrete interference. 

We may assume that x\ and % are the smallest and the largest elements of the 
input alphabet X, respectively. Then the following theorem gives an explicit solution to 
(fT5l) under some circumstances. 

Theorem 4: If g is convex in the (Q — l)-cube {(ui, . . . , : 
xi — x M < ^ < x M — xi,i = 1, 2, . . . , Q — 1}, then the optimal solution to (fT5l) is 

= if Il = - = le of) 

0, otherwise. 



19 




Fig. 7. The plot of g(u) for n = |, r% = J, si = — 2,S2 = +2, Pjv = 1. 



Proof: Define random variables C/j = Xl — i = 1, . . . ,Q — 1. The objective 

function in (fl3T) can be written as 

M M 



■^12 1 ■ ■ ■ 1 -"IJ -^IQ , 



Y " ■ Y Pr ( Xi = ' ■ • • ' x q = Xi q } 9 ^ ~ 

»X=1 *Q = 1 

M 

••• Pr {Xi = X 2 = Xfo Xq = a; ix - w iQ _ 1 } x 

jl 3Q-1 H=l 

g(iij 1 , . . . , Uj Q _ 1 ) 



M 



^2'--^2^2VT{X 1 = x il ,X 1 -X 2 = u jl ,...,X 1 -XQ = Uj^} x 
ii j'q-1 u=i 

g{iij 1 , . . . , 

M 

Y' ' ' Y Y Pr i Xl = x ^Ui = u jl ,...,U Q - 1 = Uj^} g(u h , . . .,iij Q _i) 
ii »i=i 

^Pr{^ =u j - 1 ,...,?7q_i = UiQ_ 1 }<7(«j 1 ,...,Wi Q _ 1 ) 

Jl JQ-l 

• • • , C/q_i)], (20) 
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Fig. 8. The plot of g(u\, 112) with parameters n = ri = rs = i, si = —2, S2 = 0, S3 = +2, Pjv = 1. 



where E[.] denotes the expectation operator. Now, considering the convexity of g, apply 
the Jensen's Inequality 



E\g(U 1 ,...,U Q . 1 )] > g(B[U 1 ,...,U Q ^ 1 }) 



(21) 



Equality holds when the random variables Ui, . . . , Uq-i take the value zero with proba- 
bility one, or equivalently, 

X 1 = X 2 = --- = X Q . (22) 

The joint pmf in (TT9b satisfies both the constraints in (IT3T ) and (1221) . so it is the optimal 
solution. ■ 
For Q = 2, the convexity of g in the interval [x\ — xm, xm — %i] is equivalent to 

% - x\ < si - s 2 + u*^/Pn, (23) 
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where u* ~ 1.636 and < s 2 . The proof can be found in Appendix HO In general 
(Q > 2), when the power of the noise P N is sufficiently large, g will be convex in the 
(Q — l)-cube. 

Theorem 0] has an interesting interpretation: Given the condition of theorem |4] 
satisfied, the optimal precoder sends the same symbol in the channel regardless of the 
current interference symbol. In other words, the optimal precoder for uniform transmission 
ignores the interference. In fact, as it can be seen from (|2TI) . any transmission scheme 
that forces Xi, . . . ,Xq to have the same statistical average does not benefit from the 
causal knowledge of interference symbols at the transmitter if the condition of theorem 
H] is satisfied. Note that this might not hold true for a capacity achieving coding scheme 
without any constraints on the marginal pmfs of X%, . . . , Xq. 

The following theorem holds for the case Q = 2 and when the input alphabet X is 
symmetric w.r.t. the origin, i.e., 



For example, a regular PAM constellation satisfies (1241) . 

Theorem, 5: If the input alphabet X is symmetric w.r.t. the origin, and if g is concave 
in the interval [x\ — — then 



Xi — —XM+l-i, 



I = 



M. 



(24) 




(25) 



is an optimal solution to ([TBI) . 
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Proof: We rewrite (fT5l) for the case Q = 2 as 

M M 

min Yl h » Pi i 

Pij i=l j=l 

subject to 

M l 

22 Pi j = M> i = l,2, ...,M, 
i=i 

- 1 

E^i=M> J = 1,2,...,M, 

Pii>0, i,j = l,2,...,M. (26) 

We assign to the element (i, j) of an M by M array (See fig. H}. The equality 
constraints of (|26l) mean that every row and every column of the array adds up to 
-j. We make the observation that if {pij} i j =12 m l% a f easi ble solution of (|26l) . then 
{<lij}i,j=i,2,...,M> wnere 9ij = P(M+i-j)(M+i-i), will also be a feasible solution of ([26]). 
Furthermore, due to (1241) and the fact that = g(xi-Xj), {p^} and {%■} yield the same 
objective value. Therefore, if {pij} is an optimal solution of (|26l) . {%■} will be an optimal 
solution too. The convex combination of the two optimal solutions {6^ = hpij + |%} 
is also an optimal solution with the following symmetry property 

@ij — 9(M+l-j)(M+l-i)- (2V) 

In fact, (ITTI) describes a solution which is symmetric w.r.t. the main diagonal of the 
array. So far, we have established the existence of an optimal solution to (|26l with the 
symmetry property (|27T) . Now, suppose that a symmetric optimal solution to (1261) has 
nonzero entries 

Pij = P(M+l-j)(M+l-i) = P-, (28) 

where % + j ^ M + 1. Now, if we add p to the main diagonal entries P(M+i-j)j and 
Pi(Af+i-i) and turn p^ and p(M+i-i)(M+i-i) to zero, the constraints of ((26l) are not violated. 



23 



However, the change in the objective function will be proportional to 

h(Y\Xi = Xi, X 2 = XM+i-i) + h(Y\Xi = XM+i-j, X 2 = Xj) 
-h(Y\X 1 = Xi,X 2 = Xj) - h(Y\Xi = x M +i-j,X 2 = x M +i-i), 

which is equal to g(2xi) + g(—2xj) — 2g(xi—Xj) which is non-positive by concavity of g. 
Hence, we have not increased the objective value by the process described above. We can 
repeat the process until all nonzero entries lie on the main diagonal without increasing 
the objective value. Therefore, (T25T) is an optimal solution of (|26l) . ■ 
It can be shown that g is concave in the interval [ X\ XMi %M 2-1 ] if and only if 



The general structure of a communication system for the channel defined in © is 
shown in fig. [9J In fact, fig. [9] is the same as fig. |2| for the special case of the state- 
dependent channel defined in ©. Any encoding and decoding scheme for the associated 
channel can be translated to an encoding and decoding scheme for the original channel 
defined in ©. A message w is encoded to a block of length n composed of input 
symbols of the associated channel t ~ (xi x ,Xi 2 , . . . ,x iQ ). There are M Q input symbols. 
However, we showed that the maximum rate with uniformity and integrality constraints 
can be achieved by using just M input symbols of the associated channel with equal 
probabilities. The optimal M input symbols of the associated channel are obtained by 
solving the linear programming problem (fT5l) with the integrality constraint. Those M 
input symbols of the associated channel define the optimal precoding operation: For 
any t that belongs to the set of M optimal input symbols, the precoder sends the gth 
component of t if the current interference symbol is s g , q = 1, . . . , Q. Based on the 
received sequence, the receiver decodes w as the transmitted message. 




(29) 



See Appendix [TT] for the proof. 



VI. Optimal Precoding 



24 



N 



Encoder 



Precoder 



Decoder 



Fig. 9. General structure of the communication system for channels with causally-known discrete interference. 



VII. Extension to Continuous Input Alphabet 

We can extend the uniform transmission scheme introduced in section |V] to the case 
where the channel input alphabet X is continuous. For the continuous input alphabet 
case, we consider the maximization of the transmission rate I[X\ ■ ■ • Xq; Y) over joint 
pdfs fxi-x Q { x ii ■ ■ • j x q) that induce uniform marginal distributions on X 1; . . . , Xq in 
the interval A A = [-f , f ]. 

Since h(Y) is the same for all joint pdfs fx v »x Q (xi, ■ ■ ■ , x q) that induce uniform 
marginal pdfs on X±, . . . , Xq, the maximization of the transmission rate reduces to the 
linear minimization problem 



mm 

fxyXg 

subject to 



h(xi, . . . ,x Q )f Xl -x Q (x 1 , . . .,x Q )dx 1 ■■■dx 



f Xl -x Q (xi,. . . ,x Q )dx 2 - ■ -dx Q = — , 



Q 



x\ e A A , 



fx-L-Xoixi, ■ ■ .,x Q )dx 1 ■ ■ -dx c 



fxyXgixi, ...,x Q )>0, 



: — , Xq G A A , 

xi, . . . ,x Q G A a ,(30) 



where h(xi, . . . , xq) = h(Y\X 1 = xi, . . . , Xq = xq). We are interested in solutions to 
(1301) that are of the form 

fxyXgixi, ...,x Q ) = —5 (|x 2 - + 1^3-6(^1)1 + •• • + |»g -Cq-i(^i)I) , 

(3D 
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where 8(.) is the Dirac's delta function, |.| denote absolute value, and £i,£2 S • • • , £q-i 
are bijective functions from A a to 

The joint pdf in (|3Tb describes random variables Xi, . . . , Xq, Q — 1 of which are 
functions of the other random variable. Solutions of the form (1311) can be considered 
as the continuous extension of solutions to (fT5l) with the integrality constraint for the 
discrete input alphabet case. It is easy to check that (I3TI) . with the given condition that 
£i)£2> • • • , £q-i are bijective function from A A to A A , satisfies the constraints in (l30l) . 
The objective value corresponding to the joint pdf (TJTI) is 

1 /•# 

— / ^h(x 1 ,£ 1 (x 1 ),...,£Q- 1 (x 1 ))dxi, (32) 
which is to be minimized over bijective functions £i, £2, • • • , 63-1- 



A. Comparison to Modulo Precoding 

The modulo precoding was originally proposed by Tomlinson and Harashima [21], 
[22] for the ISI channel. Then it was extended in [2] as a precoding method for channels 
with known (discrete or continuous) interference at the transmitter. The main idea is 
as follows. Based on the input symbol of the associated channel V and the current 
interference symbol S, the precoder sends [2] 

X = [V - aS] mod A, (33) 

where a = Pj f+p N (Px is the power of X) and V is distributed uniformly in A A . 
In our setting where the interference is discrete with Q levels, (1331) results in 

X q =[V-as q ] mod A, q = l,...,Q, (34) 

where X q is the random variable that represents the channel input when the current inter- 
ference symbol is s q , q = 1, . . . , Q. Since V is uniformly distributed in A&, X\, . . . , Xq 
will be uniformly distributed in A&. Therefore, modulo precoding is indeed a uniform 
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transmission scheme. We can remove V from the above equations and express X 2 , . . . , Xq 
in terms of Xi as 

X q = [Xi + a( Sl - Sg)} mod A, q = 2,...,Q. (35) 

Since X 2 , . . . , Xq are functions of X%, the joint pdf fx v -x Q ( x i, ■ ■ ■■> x o) corresponding 
to the modulo precoding fits in the category of joint pdfs in (|3TI) . The bijective functions 
corresponding to the modulo precoding are given by d35l) . These functions are circular 
shifts of each other. 

The modulo precoding corresponds to a feasible solution to (1301) which is not an 
optimal solution. For example, we may follow the line of proof of theorem 0] to show 
that for large Pn, where g becomes convex in the hyper-cube {(iti, . . . , mq-i) : — A < 
Ui < A, i — 1, . . . , Q — 1}, the optimal bijective functions are given by £i(x) — ■ ■ ■ — 
£q_i(x) = x, which are different from the functions given in (|35l) . 

To make the example more specific, consider a channel with X = A a = [—1, +1] 
and S = { — |, +|}. According to (|23T) . ^(w) will be convex if we choose P^ = 3.363. 
Then we will have a = p ^+p n = o 33°3+3 3 363 ~ 0.09. Therefore, the bijective function 
corresponding to modulo precoding is given by 

X 2 = [X x - 0.09] mod 2, (36) 

while the optimal precoding corresponds to X 2 = X\ in this example. 

VIII. Conclusion 

In this paper, we investigated M-ary signal transmission over AWGN channel with 
additive C}-level interference, where the sequence of i.i.d. interference symbols is known 
causally at the transmitter. According to Shannon's theorem for channels with side 
information at the transmitter, the capacity of our channel is the same as the capacity of 
an associated regular (without state) channel with M Q input symbols. We proved that by 
using at most MQ — Q + 1 (out of M®) input symbols the capacity is achievable. 
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For the noise-free channel, provided that the signal points are equally spaced, we 
proposed a one-shot coding scheme that uses M input symbols of the associated channel 
to achieves the capacity log 2 M bits regardless of the interference. 

We considered the maximization of the transmission rate with the constraint that 
Xi, . . . , Xq are uniformly distributed over the channel input alphabet. For this so called 
uniform transmission, the optimal input probability assignment (again with at most MQ — 
Q + 1 nonzero elements) can be obtained by solving the linear optimization problem 
(TT5T) . The optimal solution to (TT5T) with the integrality constraint has exactly M nonzero 
elements. For the case Q = 2, we showed that the integrality constraint does not reduce 
the maximum achievable rate. The loss in rate (if there is any) by imposing the integrality 
constraint for the general case is a problem to be explored. 



Denote by S the random variable that takes on x ix + Si,x i2 + s 2 , ■ ■ ■ , x iq + s Q 
with probabilities ri, r 2 , . . . , tq, respectively. Also, denote by Y the random variable 

Y \Xi = x h , . . . , X Q = x iQ . Then 



Appendix I 



Bounds For h(Y\Xi = x h , . . . ,X Q = x iQ ) 



Y = S + N. 



(37) 



Since 



< I(Y;S) < H(S), 



(38) 



we have 



< h(Y) - h(Y\S) < H(S) 



(39) 



or equivalently, 



h(N) < h(Y) < h(N) + H(S) 



h(N) + H(S). 



(40) 
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Appendix II 

Necessary And Sufficient Conditions for the convexity/concavity of g 

The function g given in (fl"8l) for the case Q = 2 can be considered as a function of 
u and parameters si, s 2 , -Pat as 



y(tt + si - s 2 ,0,0,PaO 

g( " + S lT g2 , 0,0,1 1 +log 2 Vflv- (41) 



Denote by it and — «o the inflection points of g(u, 0, 0, 1). We can obtain u$ numerically 
as uq ~ 1.636. Then the inflection points of g{u) are 

oti = s 2 - si - moa/Pv, (42) 
a 2 = s 2 - si + uq^/Pn, (43) 

The function g is convex in the interval [ax,a 2 ] and is concave anywhere else. 

The function g is convex in the interval [x\ — xm,xm — xi) if and only if [xj — 
xm,xm — x\] C [ai,a 2 ]. This gives (|23l) . 

The function g is concave in the interval [ X x X JVf j X M X 1 

] if and only if [xi — 

£m, a^Af — a^i] Q (— oo, «i] or [xi — i tf — £i] ^ [«2, oo). This gives (T29l) . 
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